Skip to content

Conversation

@willg-nv
Copy link

@willg-nv willg-nv commented Dec 17, 2025

What does this PR do?

Type of change: new feature

Overview: This PR integrate automated Q/DQ placement tool to ModelOpt. This PR is 2/4 parts of the cahnges.

Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704

This PR contains the following changes:

  1. Implement RegionPattern to represent the topology structure of Regions. InsertionPoints are also defined on RegionPattern. Regions with same pattern are optimized at the same time
  2. Implement RegionSearch class to divide ONNX graph into small regions
  3. RegionSearch python file also provides an entry point to print out the region structures.
  4. Unit tests for new classse.

Usage

python -m modelopt.onnx.quantization.autotune.region_search --model model.onnx --verbose

Example output:

    ├─ Region 212 (Level 0, Type: COMPOSITE)
    │  ├─ Direct nodes: 0
    │  ├─ Total nodes (recursive): 9
    │  ├─ Children: 1
    │  ├─ Inputs: 3 tensors
    │  │    - xxx
    │  │    - xxx
    │  │    - xxx
    │  └─ Outputs: 1 tensors
    │       - xxx
    │
    │  Child regions:
    │
      ├─ Region 209 (Level 2, Type: LEAF) 
      │  ├─ Direct nodes: 9
      │  ├─ Total nodes (recursive): 9
      │  ├─ Children: 0
      │  ├─ Inputs: 11 tensors
      │  │    - xxx

Testing

Implemented unit tests for new classes. All unit tests could get pass locally.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No, document change will be in part 4.
  • Did you update Changelog?: No. Change log will be included in part 4.

Additional Information

@willg-nv willg-nv requested a review from a team as a code owner December 17, 2025 06:29
@willg-nv willg-nv requested a review from ajrasane December 17, 2025 06:29
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@willg-nv willg-nv changed the title Dev willg integrate auto qdq placement part2 Integrate Automated QDQ placement tool - Part 2 Dec 17, 2025
@willg-nv
Copy link
Author

Hi @ajrasane , could you help me review this PR, thanks!

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch from 3f7ff31 to d3a6765 Compare December 31, 2025 02:16
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 2 times, most recently from 616285d to c95939a Compare January 8, 2026 08:35
quantized_tensors = set()

for node in onnx_model.graph.node:
if node.op_type == "QuantizeLinear":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If --dq_only is enabled, there may only be the DQ node indicating that a tensor is being quantized. Please verify that those cases are supporting with this function.

See

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 3 times, most recently from 4468ca2 to bc87ca7 Compare January 9, 2026 05:02
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_node_indices

# Module logger
logger = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +233 to +236
divergent_outputs = [
out.name for out in node.outputs if self._is_tensor_divergent(out.name)
]
is_divergent = len(divergent_outputs) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to:

is_divergent = any(self._is_tensor_divergent(out.name) for out in node.outputs)

for next_node_idx in self.tensor_users_map[output.name]:
if next_node_idx not in reachable:
reachable[next_node_idx] = distance + 1
queue.append((next_node_idx, distance + 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we skip adding the nodes to the queue if the distance + 1 < maxsteps?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no need to add this extra check. They will be skipped at Line 285 when they are poped.

            if distance >= max_steps:
                continue

2. All nodes between divergence and convergence

**Algorithm:**
1. Identify all branches from the divergent node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a mandatory criteria that a region must start with a divergent node and end with a convergent node?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • a region must start with a divergent node
    Yes, when linear probe reaches a divergent node, RegionSeach always tries to create a new region.
  • a region must end with a convergent node
    If the convergent node is too far (>= 10 steps), RegionSeach will treat current divergent node as orphane, and tries to probe from its output branches.


# Share the tensor users map from Phase 1 to avoid recomputation.
# This map is expensive to build and is shared across all refinements.
region_builder.tensor_users_map = region_partitioner.tensor_users_map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also share the forward_reachable_nodes map form Phase 1 to avoid recomputation?

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch from bc87ca7 to 6f809d7 Compare January 15, 2026 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants